We quantitatively investigate how machine learning models leak informationabout the individual data records on which they were trained. We focus on thebasic membership inference attack: given a data record and black-box access toa model, determine if the record was in the model's training dataset. Toperform membership inference against a target model, we make adversarial use ofmachine learning and train our own inference model to recognize differences inthe target model's predictions on the inputs that it trained on versus theinputs that it did not train on. We empirically evaluate our inference techniques on classification modelstrained by commercial "machine learning as a service" providers such as Googleand Amazon. Using realistic datasets and classification tasks, including ahospital discharge dataset whose membership is sensitive from the privacyperspective, we show that these models can be vulnerable to membershipinference attacks. We then investigate the factors that influence this leakageand evaluate mitigation strategies.
展开▼